Emotion-Aware Speaker Identification with Transfer Learning

نویسندگان

چکیده

Speech is a natural communication method used by humans. Speaker identification (SI) technology based on human speech has been as an entry point for many human-computer-interaction applications. The performance of SI models can degrade when dealing with expressive uttered in emotional situations because emotion databases do not have sufficient data to train various states. Generally, are trained using relatively more samples “neutral” than other classes. In this study, we propose emotion-aware (em-SI) that uses emotion-embedding vector generated from pre-trained recognition (SER) model along the acoustic features data. We assess individual English and Korean corpora confirm proposed provides improved multilingual corpora. evaluation results show accuracy em-SI Emotion Multimodal Database (KEMDy19) 3.2%, average speaker verification (SV) terms equal error rate (EER) was 1.3% compared baseline model. visualization embedding shows maps space where both information simultaneously represented. Through experiments conducted confirmed model, which learns integrating information, speech.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Feature Transfer Learning for Speech Emotion Recognition

Speech Emotion Recognition (SER) has achieved some substantial progress in the past few decades since the dawn of emotion and speech research. In many aspects, various research efforts have been made in an attempt to achieve human-like emotion recognition performance in real-life settings. However, with the availability of speech data obtained from different devices and varied acquisition condi...

متن کامل

Cost-Sensitive Learning for Emotion Robust Speaker Recognition

In the field of information security, voice is one of the most important parts in biometrics. Especially, with the development of voice communication through the Internet or telephone system, huge voice data resources are accessed. In speaker recognition, voiceprint can be applied as the unique password for the user to prove his/her identity. However, speech with various emotions can cause an u...

متن کامل

Integrating speaker identification and learning with adaptive speech recognition

Presently, speaker adaptive systems are the state-of-theart in automatic speech recognition. A general baseline model is adapted to the current speaker during recognition in order to improve the quality of the results obtained. However, the adaptation procedure needs to be able to distinguish between data from different speakers. Therefore, in a general speaker adaptive recognizer speaker recog...

متن کامل

Speaker Characteristics and Emotion Classification

In this paper, we address the — interrelated — problems of speaker characteristics (personalization) and suboptimal performance of emotion classification in state-of-the-art modules from two different points of view: first, we focus on a specific phenomenon (irregular phonation or laryngealization) and argue that its inherent multi-functionality and speaker-dependency makes its use as feature i...

متن کامل

Speaker Clustering in Emotion Recognition

Speaker variability is a known challenge for emotion recognition, however little work has been done on speaker similarity in terms of its contribution to the performance in the emotion classification task. In this paper, we investigate this topic, and find a clear link between speaker proximity and the recognition accuracy. Motivated by this result, emotion based speaker clustering is proposed ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3297715